Fusion for Audio-Visual Laughter Detection
نویسندگان
چکیده
Laughter is a highly variable signal, and can express a spectrum of emotions. This makes the automatic detection of laughter a challenging but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audio-visual laughter detection is performed by combining (fusing) the results of a separate audio and video classifier on the decision level. The video-classifier uses features based on the principal components of 20 tracked facial points, for audio we use the commonly used PLP and RASTAPLP features. Our results indicate that RASTA-PLP features outperform PLP features for laughter detection in audio. We compared hidden Markov models (HMMs), Gaussian mixture models (GMMs) and support vector machines (SVM) based classifiers, and found that RASTA-PLP combined with a GMM resulted in the best performance for the audio modality. The video features classified using a SVM resulted in the best single-modality performance. Fusion on the decision-level resulted in laughter detection with a significantly better performance than single-modality classification.
منابع مشابه
Decision-Level Fusion for Audio-Visual Laughter Detection
Laughter is a highly variable signal, which can be caused by a spectrum of emotions. This makes the automatic detection of laughter a challenging, but interesting task. We perform automatic laughter detection using audio-visual data from the AMI Meeting Corpus. Audiovisual laughter detection is performed by fusing the results of separate audio and video classifiers on the decision level. This r...
متن کاملCombining acoustic and visual features to detect laughter in adults' speech
Laughter can not only convey the affective state of the speaker but also be perceived differently based on the context in which it is used. In this paper, we focus on detecting laughter in adults’ speech using the MAHNOB laughter database. The paper explores the use of novel long-term acoustic features to capture the periodic nature of laughter and the use of computer vision-based smile feature...
متن کاملComparison of single-model and multiple-model prediction-based audiovisual fusion
Prediction-based fusion is a recently proposed audiovisual fusion approach which outperforms feature-level fusion on laughter-vs-speech discrimination. One set of predictive models is trained per class which learns the audio-to-visual and visual-to-audio feature mapping together with the time evolution of audio and visual features. Classification of a new input is performed via prediction. All ...
متن کاملAudio-visual Laughter Synthesis System
In this paper we propose an overview of a project aiming at building an audio-visual laughter synthesis system. The same approach is followed for acoustic and visual synthesis. First a database has been built to have synchronous audio and 3D visual landmarks tracking data. Then this data has been used to build HMM models of acoustic laughter and visual laughter separately. Visual laughter model...
متن کاملDemonstrating Laughter Detection in Natural Discourses
This work focuses on the demonstration of previously achieved results in the automatic detection of laughter from natural discourses. In the previous work features of two different modalities, namely audio and video from unobtrusive sources, were used to build a system of recurrent neural networks called Echo State networks to model the dynamics of laughter. This model was then again utilized t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007